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ABSTRACT 



The technique of radial velocity (RV) has produced spectacular discoveries of 
short-period Jovian mass objects around a fraction (5 to 10%) of nearby G stars. 
Although we expect Jovian planets to be located in long-period orbits of decades or 
longer (if our solar system is any guide), detecting such planets with RV technique is 
difficult due to smaller velocity amplitudes and the limited temporal baseline (5-10 yr) 
of current searches relative to the expected orbital periods. In this paper, we develop 
an analytical understanding of the sensitivity of RV technique in the regime where the 
the orbital period is larger than the total baseline of the survey. Moreover, we focus on 
the importance of the orbital phase in this "long-period" regime, and develop a Least 
Squares detection technique based on the amplitude and phase of the fitted signal. 

To illustrate the benefits of this amplitude-phase analysis, we compare it to 



existing techniques. Previous authors (e.g. Nelson Sz Angel 199^ ) have explored the 
sensitivity of an amplitude-only analysis using Monte Carlo simulations. Others have 
supplemented this by using the slope of the linear component of the fitted sinusoid 
in addition (e.g. Walker et al. 199^ ; pumming et al. 1999| ). In this paper, we 
illustrate the benefits of Least Squares over periodogram analysis, and demonstrate 
the superiority of an amplitude-phase technique over previous analyses. 
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1. Introduction 

Radial velocity surveys of nearby stars have been employed in the search for extra-solar 
planets for nearly two decades. An important element of these surveys is the sensitivity; i.e. what 
is the minimum-mass planet that can be detected at a given orbital period? This issue boils down 
to a question about how well one can detect periodic signals in a finitely sampled data set. 

There are two main regimes of analysis: the "short-period" regime, characterized by r ^ Tq 
and the "long-period" regime with t ^ Tq. Here r is the orbital period and Tq is the duration 
of the survey. The planetary objects found to date through the RV technique are short period 



objects, with periods less than a year or so (e.g. Mayor &: Queloz 1995; Marcy fc Butler 1996 



Fischer et al. 1998). The sensitivity of RV searches is very well understood in this short orbital 



period regime and compact analytical formulae exist ( [Lomb 1976 ; Scargle 1982; Horne & Baliunas 



19861 ; [Nelson fc Angel 1998| ; Gumming, Marcy, k Butler 1999). 



According to current theoretical prejudice, however, one expects giant planets to be primarily 
formed in the colder regions of the proto-planetary nebula, and thus we expect such objects to 
possess periods in the range of many years to centuries ( [Boss 19"95| ). Unfortunately, it is well 
known that the sensitivity of RV technique is considerably worse for planets with large orbital 
periods. Earlier analyses in the long-period regime have primarily concentrated on estimating the 
sensitivity through simulations (e.g. Nelson fc Angel 199^ , Gumming et al. 1999| ). 



This paper has two goals: to provide analytical insight into the sensitivity of the RV technique 
in the long-period regime, and to address the issues of detection and detection efficiency. We note 
that almost all of the previous work has concentrated on setting upper limits and not addressed 
the issue of detection. It is also worth noting that much of the previous work has been based on 
the periodogram method (e.g. Scargle 1982| ; [Horne fc Baliunas 1986| ; Walker et al. 1995; Gumming 



et al. 1999 ). Following, Nelson &: Angel (1998) , we argue at various points in the paper why a 
Least Squares approach is preferable to a periodogram approach. Essentially the Least Squares 
approach, in contrast to the traditional periodogram approach, offers the most general method 
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and needs no modifications in the long-period regime or in the sparse-data regime. 

The plan of the paper is as follows. We summarize the Least Squares approach and derive 
the basic equations in §^. In §^ we provide analytical estimates for obtaining a detection in the 
absence of any signal in the short and long-period regimes. In §^ we carry out simulations and 
obtain estimates for minimum detectable signals in the presence of noise. We conclude in §^ 



2. Radial Velocity Technique: Basic Equations 

For simplicity, we will assume circular orbits throughout this discussion. A planet in a circular 
orbit undergoes acceleration, and because the linear momentum of the system must be conserved, 
the star undergoes a reflex acceleration. It is this acceleration that directly informs us of the 
presence of the companion. However, the observable is the velocity: 

v{t) = Asm{2TTt/T + (f)) (1) 

where the amplitude, A is given by 

/ 27rGM\ 3 Mp sin i 
" \ T ) AU ■ ^ ' 

Here, 7 is the radial velocity of the planetary system, (j) is the orbital phase, r is the orbital period, 
Mp is the planet mass, M^, is the stellar mass, M = Mp + M^, ~ M^, (as Mp <^ M*), and i is the 
inclination of the orbit with respect to the plane of the sky. 

The sensitivity of an RV experiment is essentially defined as the minimum mass planetary 
companion that can be detected at a given period. For a given measurement of A, we invert 
Equation |2| to obtain a relation between Mp and r: 



1 

r2 \ 3 



1 



^pSmi = ^U-^) ra. (3) 



We see that Mp oc r 3 , which means that our sensitivity decreases as we look for longer-period 
orbits. Thus, it is not surprising that the first detection of an extra-solar planet was a planet with 
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very short orbital period ( Mayor fc Queloz 1995| ). There is an additional effect which makes it 
difficult to identify planets with long periods. With only a baseline of a few years, the RV surveys 
can only observe some fraction of A for orbital periods longer than the observing baseline. In this 
(long-period) regime, the sensitivity depends critically on the orbital phase. The most sensitive 
mass estimates are obtained when the acceleration in the radial direction attains the extreme 
value. In contrast, when the radial acceleration is close to zero, the lack of curvature in the orbit 
makes the signal difficult to distinguish from the unknown systemic velocity of the system. These 
two effects combine to explain why all the planetary detections to date have orbital periods of less 
than 3 years ([Mayor fc Queloz 1995| ; |Marcy fc Butler 1996| ; [Butler k Marcy 1996| ; [Butler et ah 



1997[ ; [Noyes et al. 1997| ; [Cochran et al. 1997] ; [Marcy et al. 1998[ ; [Fischer et al. 1998[ ; [Mayor et al" 



1998t [Marcy et al. 1999|; [Queloz et al. 1999|). 



2.1. Least Squares Fitting of Sinusoids 

The basic RV analysis consists of fitting the observations to the model specified in Equation || 
and then inferring the mass of the planet [s] from the fitted values through Equation |3[ As noted 
by several authors (e.g. Scargle 1982[ ; [Horne &: Baliunas 1986 ; Nelson fc Angel 1998[ ) the most 



optimal fitting method is obtained by using the technique of Least Squares. To enable the use 
of linear Least Squares fitting techniques we proceed by deriving a linear model equation from 
Equation ^ 

v{t) = Vc cos{ujti) + Vs sm{Luti) + 7; (4) 

here Vc = ^sini;^, Vs = Acos(p and uj = 2tt/t. 

There appears to be considerable discussion and debate about the 7 term in the literature 
(e.g. Walker et al. 1995; gumming et al. 1999] ). The origin of this debate can be traced to two 



issues. First, it is an assumption of periodogram analyses that the signal is a pure sinusoidal wave 
(i.e. 7 = 0). Indeed, the classic periodogram problem is the sine wave (with no offset) buried in 
zero-mean noise. The second issue is that astronomers are rarely interested in determining the 
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precise radial velocity of a star and its planetary system-i.e. the systemic motion. For this reason, 
7 is usually seen as a "nuisance" parameter. In some periodogram analyses, 7 is first determined 
from the mean of the data and in others 7 is determined in conjunction with Vc and Vg. Gumming 
et al. (1999) refer to these as "fixed" and "floating" mean methods, respectively while Walker et 
al. (1995) call them "correlated" and "uncorrelated" . However, our view is that such methods 
for including 7 (and the additional extension of including a constant acceleration term; see § |4.3D 
only make the periodogram closer to the Least Squares method. Thus, on philosophical grounds 
of generality we prefer the Least Squares approach. We justify this choice in greater detail below. 

Essentially the basic fact is that 7 is needed to represent the physical model correctly. 7 
may be an uninteresting parameter, but it is as unknown as fc, Vg and r. 7 can be dropped 
from Equation |l] only if it can be demonstrated that it is not covariant with the remaining three 
parameters, Vc, Vg and r. As shown below, this is not the case and thus one must solve for all the 
parameters simultaneously. In astrometry, the problem is considerably worse with the position 
and proper motion being covariant with the parameters of a potential planetary orbit. As with 
the RV case, one must solve for all unknown parameters simultaneously rather than sequentially, 
and the periodogram analyses would then include "floating means" and "flrst derivatives". 

Only in the short-period regime can 7 potentially get decoupled from the other parameters. 
For this to happen, we need dense sampling over a number of cycles. In the long-period regime, 
as noted below, the cross-talk never disappears and one must solve for 7, regardless of the density 
of sampling. The covariance of 7 with orbital parameters is easily seen here since in this regime, 
we only measure a portion of the orbit and in this limit, the orbit can be approximated by a 
linear term (constant velocity) and curvature (acceleration). The flrst term is covariant with the 
systemic motion. In the astrometry context. Black & Scargle (1982) were the flrst to recognize the 
consequences of this covariance. 

The parameters of our data set are as follows: the duration of the survey is Tq, v'{tj) is the 
measured RV at epoch t^, and no is the number of measured epochs. With no loss of generality, 
we let our time go from t = — To/2 to t = To/2. This device allows us to simplify the normal 
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equations (see below). To find the three unknowns, Vc, Vs and 7, we minimize 

no-l 

= ^ [v'{ti) - VcCOs{uJti) - Vs sm{ujti) - 
with respect to Vc, Vg, and 7. This yields a matrix equation for the three unknowns: 



Er=o ^ cos(wti) sin(wti) 
Er=2o'cos(a;t,) 



Er=o ^ cos(a;ti) sin(a;tj) 


E?io'cosM,)^ 








Er=o ^ sin(a;ti) 


X 


Vs 


Erio'sinM.) 


no / 







/Er=o'^'(*i)cos(a;ti)\ 
Er=o^^'(*i)sin(wti^ 



V 



(5) 



(6) 



In the short-period regime where r <C Tq, we observe many cycles and thus, under most 
reasonable sampling schemes, the sinusoidal summations, sin(ci;t), cos(wt), and sin(ci;t) x cos(LiJt) 
will average to zero, while sin^(a;t) and cos^(a;t) will average to 1/2. Thus, in the short-period 
regime the matrix in Equation |6| becomes diagonal, and the fit parameters are given by 



2 "0-1 
Vc = — > v' iti) cositoti 



(7) 



"0-1 



— v' {ti) su\{ujti 
no ^ 



(8) 



i=0 



1 



"0-1 



7 = - E 

"0 7^0 



(9) 



In the long-period regime, r > Tq, the matrix is not diagonal since most of the terms do not 
average to zero. However, by design (and assuming a reasonable sampling scheme), the sinusoidal 
summations involving only one power of sin(wt) will average to zero yielding the following normal 
equations: 

no — 1 "0 — 1 "0 — 1 

~ " ~ ~ (10) 



no — 1 no — 1 "0 — 1 

Vc E cos^(u;tj) -|- 7 E cos{ujti) = E v' {ti) cos{ujti) 



=0 



i=0 



j=0 
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no-l no-1 

Vs ^ sm^(a;ti) = ^ v'(tj) sin(a;tj) (11) 

i=0 i=0 

no-1 no-1 

i;c COS (cjtj ) + 7710 = ^ f'(ii)- (12) 

i=0 i=0 



3. Type I Errors 

Type I errors describe the probability that a high amphtude will be obtained even when no 
signal is present in the data. We assess the probability of type I errors, assuming that our data 
set consists of Gaussian noise with a mean of zero and a standard deviation of ctq. 

Short-Period Regime. From Equations |^-^ we note that for most reasonable sampling schemes, 
Vc, Vs and 7 are simply sums of Gaussian variables and thus from the Gaussian addition theorem, 
all three derived parameters are also Gaussian variables. Specifically, Vc and Vg obey a Gaussian 
distribution with a mean of zero and a standard deviation of 

o- = J— o-o, (13) 
V rio 

where no is the number of measurements. Denoting by Vis the value of \vc\ (or \vs\) that is 
exceeded in 1% of cases, we note that 

Vis = 2.61a = ^ ao. (14) 



From Equation ^ we see that 7 follows a Gaussian distribution with zero mean and a variance of 
cr^/no. Denoting F = I7I we note that the 99**^ percentile value of F is Fi^ = 2.61cro/y^. Thus 
Fi, = O.7IF1,. 

Because we are interested in the fitted amplitude, we will now examine the combined statistics 
of Vc and Vs- The probability density function for Vc and Vs is 

p{vc, Vs)dvcdvs = ^ e-'''/^-^ g-'''/^-^ dv^ dvs- (15) 

Denoting by K the square of the inferred amplitude, K = v'^ + v1, we note that the probability 
density function of K is an exponential and find the probability to be 

P{K <Kii) = l-e-^^/^"\ (16) 
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Thus, the squared velocity amphtude that is exceeded by pure fluctuations in only 1% of the cases, 



IS 



VI -0.997 UQ ^ ' 

Long- Period Regime. In this regime, the expressions for u^, Vg and 7 are not as simple as those 
in the short-period regime, and so we resort to simulations. To this end, we create a synthetic data 
set consisting of Gaussian noise with co = 3 m s~^, which is the best accuracy thus far obtained 
for radial velocity measurements of this type ( Butler et al. 19961 ). We sample the synthetic data 



at one month intervals for Tq = 12 years. We explore fitted periods from 5 years to 100 years, 
choosing the interval between sampled periods so as to result in a 1 radian decrease in the total 
number of orbital cycles over the the length of the observations. 

At = -— . (18) 
27rro ^ ' 

Thus the sequence of the periods which we consider is 60, 64, 69, 74, 80, 87, 95, 105, 117, 132, 152, 
177, 212, 261, 337, 463, 699, and 1239 months. 

For each period, r, we simulate N = 1000 data sets and carry out the Least Squares fit. We 
set Ki equal to the lO*'^ highest K that arises. Clearly, 99% of the fitted ET's will lie below this 
value. A plot of Ki versus r is shown in Figure |l|. As Figure |^ indicates, our simulations are in 
excellent agreement with the expected value of Kis in the short period regime (Equation ^). 

Thus far, our analysis has followed that of Nelson &: Angel (1998), but at this point the two 
analyses diverge. Nelson & Angel (1998) assume that Ki in the long-period regime is a power law 
and from their simulations find Ki oc r^'^^. The value of this exponent has no natural explanation 
and indeed it was this unusual value that motivated the present analysis. 

The principal goal of our analysis was to understand the behavior of Ki in the long-period 
regime and to derive an analytical expression for Ki that is valid for all fitted periods. We begin 
by examining the covariance matrix of Vc, Vg, and 7. At short periods, we find these three fitted 
parameters are uncorrelated; this is expected from Equations 
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However, the situation is quite different in the long-period regime as can be seen from the 
plots in Figure From this figure we conclude the following: 

1. Vc and Vs are correlated. 

2. Vc and 7 are anti-correlated. 

3. Vs and 7 are uncorrelated. 

The corollary to this result is that phase becomes important in the long-period regime. Indeed, 
as can be seen from Figure ^, ~ ±90° is preferred. Nelson & Angel (1998) and Gumming et al. 
(1999) recognized that the phase would become non-random in the long period regime. However, 
neither they nor others have explored the full implications of this fact. Below we look into this 
issue in more detail. 

We now provide a simple (physical) explanation of the results displayed in Figure When 
r ^ To, sinusoids with random phase can be fit to the Gaussian data. Glearly, the amplitude 
of these sinusoids cannot be significantly bigger than the vertical scale of the data (which is 
approximately (Tq = 3 ms^^). However, when r > Tq, this is no longer necessarily the case. The 
maximum value of K is obtained when we fit a cosine to random data since a cosine is flat around 
t = 0; we remind the reader of the choice of our time baseline, [—To/2, To/2]. Recall that for small 
t, cos t ~ 1 - Thus, the size of the fitted cosine is limited by how much it deviates from a 

constant in the range from t = [0,To/2]. This deviation is given by 

1 - cos . (19) 



r 



The amplitude of this cosine is then chosen so that this deviation from a constant will be 



roughly equal to the vertical scale of the noise ((Tq). So, Equation 19 tells us the fraction of the 
total fitted amplitude, and thus the actual fitted amplitude is 

2Vis 



Vc 



ci — 

(1 — COS 



TrTn 



for r > To (20) 



here, Vc^ is the value of \vc\ which will be exceeded in 1% of Least Squares fits to Gaussian noise. 



Vis is the corresponding Vc^ in the short-period regime (Equation 14), and 2 is a normalization 
factor (since the above deviation was peak-to-peak). 
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Now, we consider the behavior of Vg in the long-period regime. Here, there is a shght 
difference: because the full amplitude of a sine centered around t = can be observed in the 
interval [— r/4, r/4], the "long-period regime for Vg" does not actually begin until r > 2Tq. 
However, once we are in the long-period regime, our analysis is much the same. In particular, we 
find the amplitude of a sine wave such that in the interval from t = [— To/2, To/2], the sine wave 
does not exceed the vertical scale of the noise. Following the same lines as the analysis for Vc, it is 
clear that this amplitude must be 

Vs, = . ; forr>2To (21) 

sin (^) 

This function is plotted with the simulated data in Figure ^. Not surprisingly, Vs-^ is considerably 
smaller than V^^. 

Understanding the behavior of 7 is not difficult in light of our understanding of Vc- Since the 
mean value of the simulated data is zero, our fitted function must also, in the mean, be zero. The 
anti-correlation between 7 and Vc in Figure confirms this expectation. From the above discussion. 



we know that the fitted signal is primarily a cosine with an amplitude given by Equation We 
must choose a 7 that translates the fitted cosine such that the result is "centered" around y = 0. 
This translation is accomplished by subtracting the fitted amplitude A (which produces a cosine 
wave whose maximum value is zero) and then adding back o"o/2 in order to properly center the 
fitted signal. Thus, we expect 

Fi = Fi, { ^ (22) 



1 — COS 


vrTo 




T 



Note that we expect no correlation between Vg and 7, because Vg is already centered around zero, 
and requires no translation (or physically, a sinusoidal orbit with (p = 0° is non-degenerate with 
constant velocity motion). 

Equipped with the behaviors of Fci and Vg-^, we are now in a position to write down an 
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analytical expression for Ki (oc V^_^ + V^^ ) at all periods: 

for r < To 

for To <T <2To _ (^23) 
+ . forr>2To 





18.420-2 
no 




(1— cos 








^ 1 


Jl_cos[^])2 



Here, the factor of 5/4 is chosen to enforce continuity of Ki for all periods. Since Vc'^ Vg for 



r > To, to a very good approximation, we can express Equation as 



(1 — cos 



ttTo 



forr>ro. (24) 



We prefer this compact formula over the more exact formula (Equation ^3|). As demonstrated by 
Figures Q and ^|-^, the analytical expressions derived above provide excellent fits to the simulated 
data. Thus, we have fulfilled our original objective of determining an analytical understanding of 
Ki. 

We now turn to the importance of the phase term. As can be seen from Figure ||, in the short 
period regime, the distribution of Vc and Vs is cylindrically symmetric. However, in the long-period 
regime the distribution becomes highly elliptical, and the Least Squares fit overwhelmingly prefers 
orbits with (j) = ±90°. An orbit sampled at a phase of ±90° has the smallest radial acceleration 
and thus for a given set of measurements yields the largest value of K. 

We have so far considered only the statistics of K, which is perfectly reasonable when the 
Vc-Vs distribution is cylindrically symmetric. However, in the long-period regime this distribution 
is elliptical, in which case the exclusive use of the radial parameter K is bound not to be optimal. 
Thus in the long-period regime, we need to look at both the Vc and Vs, or equivalently the phase 
and amplitude of the fitted parameters. 

To this end, we define ellipses in the Vc-Vs plane, ei, such that 1% of simulated fitted pairs lie 
outside this ellipse. The parameters of these ellipses are easily obtained since we have analytical 
expressions for Vc-^ and V^i (Equations 120 and 21). Further discussion and exploitation of this 



"amplitude-phase" {K-(f>) analysis is postponed to Section | 
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Uneven and Sparse Sampling. We re-iterate that the above results have been derived under 
the assumption of dense, even sampling. Here, we examine the validity of these results in the case 
of random sampling, paying particular attention to the regime where no is small. 



As discussed in Section 2.1, Equations 14-17 are only valid if the chosen sampling scheme 



preserves the independence of Vc, Vg, and 7. If we have a large number of samples over a number 
of cycles, then even if our sampling is completely random, we expect that the dense sampling of 
the sinusoidal summations will preserve the diagonality of Equation |6| (and hence Equations 0-|9| 
will hold). However, for sparse sampling (i.e. small no), our choice of sampling scheme becomes 
important. In particular, if we sample the data evenly in the interval [—To/2, To/2], then Equation 
^ remains diagonal, and the independence of Vc, Vg, and 7 is preserved. In contrast, if the data are 
sparsely and unevenly sampled, then the summations of sin(a't), cos(wt), and sm{ujt) x cos{ujt) 
may no longer average to zero, and Vc, Vs, and 7 will be covariant. Physically, this corresponds to 
the fact that for a small number of randomly spaced samples, the sampled sinusoid might look like 
a straight line (e.g. if all the samples happen to lie at the zero-crossings of the sine- wave). Thus, 
for sparse and uneven sampling, Equations p^-^ do not accurately describe the sensitivity of the 
Least Squares fit. 

We verify these assertions through simulations: for no = {5, ...,25}, we simulate = 1000 
data sets and do Least Squares fits to sinusoids with r = 0.2 To. The sampling times are drawn 
from a distribution given by tj e [At x (j — R) , At x [j + R)]. Here, At denotes the sampling interval, 
given by To /no, and i? is a parameter describing the unevenness of the sampling, i? = gives 
an even sampling scheme, and R = 1/2 gives random sampling. The results of these simulations 
are shown in Figure ^. We see that for no <^ 10, one suffers substantial losses in sensitivity if the 
data are unevenly sampled. This phenomenon was noted by Gumming et al. (1999) . However, 



the basic reason why we see this phenomenon is because Equation 17 is no longer valid. This 
discussion re-emphasizes of the generality and robustness of the Least Squares approach over the 
periodogram approach. 

We now repeat the above simulations using fitted sinusoids with r = 10 To, to explore the 
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effects of sparse and uneven sampling in the long-period regime. The results of this simulation 
are shown in Figure |^ From this figure we see that in the long-period regime, the evenness of 
the sampling scheme is not nearly as important as it is in the short-period regime. This is to be 
expected, because our analysis of the behavior of Ki in the long-period regime takes into account 
the covariance between Vc, Vs, and 7, and thus we expect it to be applicable to sparsely and 
unevenly sampled data. 

We end this section by acknowledging that our treatment of the statistics is not accurate in 
the low no regime. At the very least one needs to be aware of the loss of the degrees of freedom 
because three parameters are obtained from the data. The correct value of the degrees of freedom 
will depend on what statistic is being estimated. This issue is not central to the main goal of this 
paper and we intend to investigate this issue in a later paper. 

4. Type II Errors 

So far we have computed the probability of detecting an apparent signal generated purely 
by noise. In the language of inference, we have discussed Type I probabilities. Our analysis was 
centered on obtaining the statistics of K, the squared amplitude from the fitted parameters Vc and 
Vs- In particular, we estimated Ki, the 99**^ percentile of K (see Equation ^). However, following 
this analysis, we noted that in the long-period regime, Vc and Vs form an elliptical distribution, 
and thus merely looking at the statistics of K, a radial parameter, was not optimal. 

We now consider Type II probabilities - the probability of failing to detect a genuine signal 
due to contamination by noise. The goal of this section is to understand the statistics of K in the 
presence of both signal and noise. To this end, we simulate a data set that consists of signal and 
noise: 

v'{ti) = ^sm{ - + (P) + N{ti), (25) 

T 

where ^/K is the amplitude of the signal, and N{ti) is the Gaussian noise. We let (p be drawn 
from a uniform distribution in the interval from [0, 27r], an appropriate assumption for circular 



- 15 - 



orbits. We choose an initial signal amplitude of cro/2, and then do = 1000 Least Squares fits 
(with the same parameters as in Section . 

4.1. Amplitude-Only Analysis 

Following the path used for Type I errors, we will first base our analysis on i.e. we will 
ignore the issue of the elliptical nature of the Vc-Vs distribution. At each period, we determine 
how many of the fitted amplitudes lie below Ki. If it is less than O.OIA^, then we have found the 
value of K such that 99% of the fitted amplitudes lie above Ki. We call this i^gg. If, however, 
more than 1% of the fitted amplitudes lie below Ki, we increment the signal amplitude by i^'^-^/20 
until we find i^gg. A plot of Kqq/Ki versus period is shown in Figure ^. 

We note that our choice of 99% confidence for K is arbitrary and also very conservative. Most 
observers would be keen to make a discovery rather than set stringent upper limits. Thus one may 
wish to consider K^q (or Kgo), which gives the signal amplitude necessary to be detected 50% 



(or 90%) of the time. Plots of Kgo and K50 versus period are shown in Figure || and Figure |1C 
respectively. 



4.2. Amplitude-Phase Analysis 

In Section H, we showed that in the long-period regime, the Vq-Vs distribution is elliptical. In 



view of this, the phase of the signal in Equation 25 is critical. This point is best understood in the 



results of the simulations displayed in Figure 11 for five cases: (a) no signal, (b)-(d), a signal with 
amplitude (at that period) and (j) = 0°,45°,90°, respectively, and (e) a signal with amplitude 
of \/Ki and the phase being randomly (for each simulation) chosen from the range [0, 27r] (uniform 
probability density function). 



In Figure 11, the amplitude-only analysis would consider all points that lie inside the circle 



of radius Ki to be indistinguishable from those produced by noise. However, in the framework 
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of the K-((> analysis, we would consider all points within the ellipse to be indistinguishable from 
those produced from noise. The superiority of the K-(p analysis is evident given that the size of 
the ellipse is smaller than that of the circle. This superiority is reflected numerically as follows: 
the fraction of simulations in which the signal is reliably recovered by the K-(j) analysis is (100%, 
100%, 33%, 95%) for cases (b-e) whereas for the amplitude-only analysis the corresponding 
fractions are (25%, 35%, 33%, 34%). 



A comparison of Figures ^10 shows the increasing gain of the K-(f> analysis as the tolerance 
for committing type II error is increased. When the Type II probability is set to 99%, the K-(p 
analysis lowers the K threshold by 20% whereas if we are willing to accept a 50% type II error 
probability then the K threshold is lowered by nearly a factor of 30! 



4.3. Linear Analysis of Long Term Trends 

Above we illustrated the superiority of an amplitude-phase analysis over an amplitude-only 
analysis. However, some previous authors ( [Walker et al. 1995 ; pumming et al. 1999 ) have 



analyzed long-term trends in the data (i.e. apparent signals with r ^ Tq) by examining the slope 
of the best-fit straight line, denoted by o. This best-fit line is subtracted from the data, and an 
amplitude-only analysis is subsequently performed on the residuals. Here, we discuss this "slope 



analysis" in the context of the results of Sections 4.1 and 4.2 



To determine the sensitivity of slope analysis, we simulate 1000 data sets with Gaussian 
noise of zero mean and ctq = 3ms^^, sampled at one month intervals for Tq = 12 years. For 
each simulated data set, we perform a Least Squares fit to a straight line, given by at + b. The 
99**^ percentile slope, denoted by Ai, is determined by finding the value of \a\ such that 99% of 
simulated data sets yield |o| < ^i. Next, we determine the type II errors, by injecting a sinusoidal 



signal into the simulated data (Equation 25) and finding the necessary squared signal amplitude 



{K) such that the signal can be reliably recovered. This analysis is the same as that carried out 



in Section 4.1 except that instead of using the amplitude of fitted sinusoids, we use the slope of 
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fitted straight lines to set confidence limits. Thus, at each sampled period, we can distinguish a 
real detection from one produced by noise if the fitted \a\ is greater than Ai. 

The results of these simulations are shown in Figures |^-|lO[ At sampled periods where the 
amplitude-only analysis is more sensitive than the slope analysis, only the former is shown. We 
note that although the slope analysis may be more sensitive than the amplitude-only analysis 
(in the long-period regime), the amplitude-phase analysis still yields a significant improvement. 
Moreover, the discrepancy between the amplitude-phase and slope analyses is largest for r ^ STq. 
We examine this discrepancy further by performing an analysis analogous to that shown in Figure 



ll|e. In particular, for fitted periods of r = 2 Tq and r = 10 Tg, we inject simulated data sets 
consisting of sinusoids of random phase and K = Ki plus Gaussian noise. The percentage of 
signals recovered by the amplitude-only, amplitude-phase, and slope analyses (respectively) is 
(57%, 87%, 59%) for t = 2To, and (34%, 95% 91%) for r = 10 Tq. 

We can understand the relative sensitivities of the K-(f> and slope analyses by noting that 
the slope analysis is an explicitly linear technique, and thus it throws away any information 
that is contained in the curvature of the sine or cosine components of the fitted sinusoid. The 
K-(f) analysis, in contrast, utilizes all of the information contained in the linear and curvature 
components of the fitted sinusoid. These curvature components will tend to zero for t ^ Tq, and 
thus in this regime the sensitivity of the slope analysis will approach that of amplitude-phase 
analysis. However, for r ~ 2 — 3 Tq, these curvature terms are significant, and the i^-(/> analysis 
yields a substantial improvement over a slope analysis. 

The above discussion (in our view) re-emphasizes the generality of the Least Squares approach. 
While Walker et el. (1995) and Gumming et al. (1999) introduce a modified periodogram to 
account for a slope in the data, the Least Squares approach can be applied without modification. 



as it has been in S4.2 
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5. Conclusions 

We have developed an analytical understanding of the sensitivity of the radial velocity 
technique of planet detection in the regime where the orbital period is longer than the total 
baseline of the observations. We also examined the sensitivity in the short-period regime, 
paying particular attention to the case of sparsely sampled data. Moreover, we have illustrated 
the benefits of Least Squares fitting over the equivalent, but more complicated technique of 
periodogram analysis; while the periodogram must be modified to deal with long-period signals, 
or with sparsely sampled data, the Least Squares approach can be applied in its basic form. 

We have also discovered the potentially exciting new result that in the long-period regime one 



obtains additional information from the phase. Analyses of the RV data to date (e.g. Walker et 



al. 1995| [Nelson fc Angel 1998| ; pumming et al. 1999 ) have been based either on amplitude-only 



analyses, or on amplitude analyses supplemented by slope analyses. However, as dramatically 
demonstrated by Figures ^ and |l^, K-(f) analysis has significant advantages over these previous 
analyses, especially in the interesting regime where r ~ 2 — 3 Tq. 

Thus, we propose using a confidence test based on amplitude and phase, characterized by 
ellipses in the Vc^Vs plane, ei. The analytical expressions we have developed for and V^^ provide 
the analytical behavior of ei as a function of period. For real RV data, like those mentioned above, 
we suggest that the fitted Vc and Vg (at each period) be compared to the ei ellipses. Points lying 
outside these ellipses would signify the detection of periodic signals. 

We have applied this K-(j) analysis technique to the RV data of Walker et al. (1995). While 
Nelson & Angel (1998) examined this data with an amplitude-only analysis and reported several 
marginal detections, our technique yielded several clear detections of periodic signals. Although 
we cannot say whether these periodicities represent planetary signals (as opposed to stellar cycles 
or periodic systematic errors), it is clear that the use of amplitude and phase allows previously 
unknown periodicities to be detected. In the future, we hope to apply this K-cp analysis to more 
comprehensive RV surveys, and to detect long-period companions. 
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Fig. 1. — A plot of \og{Ki/Kis) versus log(r/To) in the long-period regime of the radial velocity 
technique (solid line). Ki is the 99th percentile K = v"^ + Here, we simulated N = 1000 data 
sets with Gaussian noise of zero mean and ctq = 3ms~^. The data were sampled at one month 
intervals for Tq = 144 months. The dashed line shows the analytical Ki predicted by Equation p^. 
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Fig. 2. — Plots of Vc versus Vg and Vc vcrsTis 7 in the short and long-period regimes. In both cases, 
the duration of the RV monitoring is Tq = 12 yr. In the short-period regime, r ^ Tq, Vc, Vg, and 
7 are uncorrelated (a),(c). In the long-period regime, strong correlations are seen. In particular, 
we see that orbital phases of approximately ±90° are preferred (b) and that Vc and 7 are strongly 
anti-correlated (d). 
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Fig. 3. — A plot of log(V"ci/Vis) versus log(r/To). Vc^ is the value of \vc\ that is exceeded in 1% of 
the simulations and Vis is a normalizing factor (see Equation 14). See the caption of Figure |l| for 
details of the simulations. The solid line shows the behavior of the simulated data, and the dashed 



line represents the analytical expression from Equation 2C. 
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Fig. 4. — A plot of log(T4i/Vis) versus log(T/To). Vg-^ is the value of \vs\ that is exceeded in 1% 
of the simulations. See caption to Figure Q for details of the simulations. The solid line shows 
the behavior of the simulated data, and the dashed line represents the analytic expression from 
Equation Bearing in mind that the vertical scale of this figure is quite fine, we note that the 
analytical expression is a good fit to the simulated data. 
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Fig. 5. — A plot of log(ri/ris) versus log(T/To). Fi is the value of I7I that is exceeded in 1% 
of the simulations. See caption to Figure Q for details of the simulations. The solid line shows 
the behavior of the simulated data, and the dashed line represents the analytic expression from 



Equation 22. 
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Fig. 6. — A plot of Kis versus no for evenly sampled (solid line) and randomly sampled simulated 
data (dotted line). (For details of the simulations, see the caption to Figure |l|.) The analytical 



expression given by Equation |17| is indicated by the dashed line. All of these curves are normalized 
by the value of Kis for no = 100, denoted by -ftTioo- We note that for no ^ 10, the inclusion of 7 
in the model equation is important for randomly sampled data, and that regardless of the chosen 
sampling scheme, we suffer significant losses in sensitivity. 
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Fig. 7. — A plot of Ki versus no for evenly sampled (solid line) and randomly sampled simulated 
data (dotted line). (For details of the simulations, see the caption to Figure |l|.) The analytical 



expression given by Equation 24 is indicated by the dashed line. All of these curves are normalized 



by the value of Ki for no = 100, denoted by i^ioo- We note that the difference between even and 
random sampling is not overly important, as we expect because the behavior of Ki in the long- 
period regime (Equation p3) was derived under the assumption that Vc, Vs, and 7 are covariant. 
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Fig. 8. — A plot of \og(Kgg / Ki) versus log(T/To) as determined in the amplitude-only and K-(j) 
cases. For the amplitude-only case (solid line), Kgg represents the squared signal amplitude such 
that 99% of the simulated data yield fitted K's greater than Ki. For the K-<j) analysis, Kgg is the 
necessary amplitude such that 99% of fitted K and cj), or equivalently {vc, v^}, lie outside of ei (the 
ei ellipse contains 99% of fits to noise-only data; for further details see the discussion towards the 
end of Section ^). The 99*^ percentile of the slope analysis (discussed in Section 4.3) is greater 
than the amplitude-only Kgg for all sampled periods, and thus is not shown here. At each period, 
we carried out 10,000 simulations; the phase of the signal was assumed to be randomly and evenly 
distributed over the range [0,27r]. 
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Fig. 9. — A plot of \og{KQQ / Ki) versus log(T/To) as determined in the amplitude-only (solid line), 
K-4> (dashed line), and slope-only (dotted line) cases. At sampled periods where the amplitude- 
only Kqq is less than the slope-only i^go, only the former is shown. Kqq represents the squared 
signal amplitude such that 90% of the simulated data yield fitted coefficients greater than the type 
I sensitivity limits (at that period), given by Ki, ei, and Ai for the i^-only, K-(j), and a-only 
analyses, respectively. See the caption to Figure |8| for further details. 
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Fig. 10. — A plot of log(-K'5o/i^i) versus log(r/To) as determined in the amplitude-only case (solid 
line), the K-(j) case (dashed line), and the slope-only case (dotted line). At sampled periods where 
the amplitude-only K^q is less than the slope-only i^^so, only the former is shown. K^q represents 
the squared signal amplitude such that 50% of the simulated data yield fitted coefficients greater 
than the type I sensitivity limits (at that period), given by Ki^ ei, and Ai for the X-only, K-(j), 
and a-only analyses, respectively. See the caption of Figure ^ for further details. 
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Fig. 11. — Distribution of the fitted parameters, {vc,Vs} for log(To/ro) = 1.0. A total oi N = 1000 
simulations were performed for five different assumptions, (a) No signal is present, (b)-(e) show 
data sets containing signals with an amplitude of \/Ki, and = 0°, 45°, 90°, and random phase, 
respectively, ei and the circle of radius Ki are also plotted in (a)-(e). (a) shows that 99% of the 
fitted data points lie within ei . (b)-(e) demonstrate the importance of phase in the determination 
of Type II errors: for signals with phases different than ±90°, the fitted {vc,Vs} have a significant 
component along the v^-axis, causing them to lie outside of ei. Thus, the fraction of signals detected 
by the K-cj) analysis is (100%, 100%, 33%, 95%) for cases (b)-(e), while the corresponding fractions 
for the amplitude-only analysis are (25%, 35%, 33%, 34%). 
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